On Wed, Jan 24, 2024 at 1:39 PM Andres Freund <and...@anarazel.de> wrote:
> On 2024-01-23 17:26:19 -0600, Tristan Partin wrote:
> > On Tue Jan 23, 2024 at 4:23 PM CST, Andres Freund wrote:
> > > A fork() while threads are running is undefined behavior IIRC, and 
> > > undefined
> > > behavior isn't limited to a single thread. You'd definitely need to use
> > > pthread_sigprocmask etc to address that aspect alone.
> >
> > If you can find a resource that explains the UB, I would be very interested
> > to read that. I found a SO[0] answer that made it seem like this actually
> > wasn't the case.
>
> I think there are safe ways to do it, but I don't think we currently reliably
> do so. It certainly wouldn't be well defined to have a thread created in
> postmaster, before backends are forked off ("the child process may only
> execute async-signal-safe operations until such time as one of the exec
> functions is called").

Right, the classic example is that if you fork() while some other
thread is in malloc() or fwrite() or whatever libc or other unknown
code it might hold a mutex that will never be released in the child.

As for what exactly might be happening in this case, I tried calling
SCDynamicStoreCopyProxies() and saw a new thread sitting in
__workq_kernreturn, which smells like libdispatch AKA GCD, Apple's
thread pool job dispatch thing.  I tried to step my way through and
follow along on Apple's github and saw plenty of uses of libdispatch
in CoreFoundation code, but not the culprit, and then I hit libxpc,
which is closed source so I lost interest.  Boo.  Then I found this
article that says some interesting stuff about all that:

https://www.evanjones.ca/fork-is-dangerous.html

That code has changed a bit since then but still tries to detect unsafe forks.

https://github.com/apple-oss-distributions/libdispatch/blob/main/src/init.c

These days, I don't think the original corruption complaint that led
to that am-I-multihreaded check being added to PostgreSQL could happen
anyway, because the postmaster would now process its state machine
serially in the main thread's work loop even if a random unexpected
thread happened to run the handler.  But obviously that doesn't help
us with these other complications so that observation isn't very
interesting.

As for sigprocmask() vs pthread_sigmask(), the sources of
unspecifiedness I am aware of are: (1) Unix vendors disagreeing on
whether the former affected only the calling thread or the whole
process before threads were standardised, and we can see that they
still differ today (eg Darwin/XNU loops over all threads setting them,
while many other systems do exactly the same as pthread_sigmask()),
and (2) libcs sometimes use or defer some signals for their own
private purposes, so sometimes pthread_sigmask() has a wrapper doing
some footwork in userspace rather than just invoking the system call,
but I dunno.  In one of the threads about avoiding bad behaviour
around system(), I think there might have been some ideas about
getting rid of the need to block signals at all, which I think must be
theoretically possible if the handlers are smart enough to avoid
misbehaving in child processes, and maybe we use moar latches.


Reply via email to