On Wed, Jan 24, 2024 at 1:39 PM Andres Freund <and...@anarazel.de> wrote: > On 2024-01-23 17:26:19 -0600, Tristan Partin wrote: > > On Tue Jan 23, 2024 at 4:23 PM CST, Andres Freund wrote: > > > A fork() while threads are running is undefined behavior IIRC, and > > > undefined > > > behavior isn't limited to a single thread. You'd definitely need to use > > > pthread_sigprocmask etc to address that aspect alone. > > > > If you can find a resource that explains the UB, I would be very interested > > to read that. I found a SO[0] answer that made it seem like this actually > > wasn't the case. > > I think there are safe ways to do it, but I don't think we currently reliably > do so. It certainly wouldn't be well defined to have a thread created in > postmaster, before backends are forked off ("the child process may only > execute async-signal-safe operations until such time as one of the exec > functions is called").
Right, the classic example is that if you fork() while some other thread is in malloc() or fwrite() or whatever libc or other unknown code it might hold a mutex that will never be released in the child. As for what exactly might be happening in this case, I tried calling SCDynamicStoreCopyProxies() and saw a new thread sitting in __workq_kernreturn, which smells like libdispatch AKA GCD, Apple's thread pool job dispatch thing. I tried to step my way through and follow along on Apple's github and saw plenty of uses of libdispatch in CoreFoundation code, but not the culprit, and then I hit libxpc, which is closed source so I lost interest. Boo. Then I found this article that says some interesting stuff about all that: https://www.evanjones.ca/fork-is-dangerous.html That code has changed a bit since then but still tries to detect unsafe forks. https://github.com/apple-oss-distributions/libdispatch/blob/main/src/init.c These days, I don't think the original corruption complaint that led to that am-I-multihreaded check being added to PostgreSQL could happen anyway, because the postmaster would now process its state machine serially in the main thread's work loop even if a random unexpected thread happened to run the handler. But obviously that doesn't help us with these other complications so that observation isn't very interesting. As for sigprocmask() vs pthread_sigmask(), the sources of unspecifiedness I am aware of are: (1) Unix vendors disagreeing on whether the former affected only the calling thread or the whole process before threads were standardised, and we can see that they still differ today (eg Darwin/XNU loops over all threads setting them, while many other systems do exactly the same as pthread_sigmask()), and (2) libcs sometimes use or defer some signals for their own private purposes, so sometimes pthread_sigmask() has a wrapper doing some footwork in userspace rather than just invoking the system call, but I dunno. In one of the threads about avoiding bad behaviour around system(), I think there might have been some ideas about getting rid of the need to block signals at all, which I think must be theoretically possible if the handlers are smart enough to avoid misbehaving in child processes, and maybe we use moar latches.