On Sat, Jul 2, 2022 at 9:06 AM Andres Freund <and...@anarazel.de> wrote: > On 2022-07-01 13:29:44 -0700, Andres Freund wrote: > > On 2022-07-01 19:55:16 +0200, Alvaro Herrera wrote: > > > Allow DSM allocation to be interrupted. > > > > > > Chris Travers reported that the startup process can repeatedly try to > > > cancel a backend that is in a posix_fallocate()/EINTR loop and cause > > > it > > > to loop forever. Teach the retry loop to give up if an interrupt is > > > pending. Don't actually check for interrupts in that loop though, > > > because a non-local exit would skip some clean-up code in the caller. > > > > That whole approach seems quite wrong to me. At the absolute very least the > > code needs to check if interrupts are being processed in the current context > > before just giving up due to ProcDiePending || QueryCancelPending. > > > > I'm very unconvinced this ought to be fixed in dsm_impl_posix_resize(), > > rather > > than the startup process signalling.
I agree it's not great. It was a back-patchable bandaid in need of a better solution. > Chris, do you have any additional details about the machine that lead to this > change? OS version, whether it might have been swapping, etc? > > I wonder if what happened is that posix_fallocate() used glibc's fallback > implementation because the kernel was old enough to not support fallocate() > for tmpfs. Looks like support for fallocate() for tmpfs was added in 3.5 > ([1]). So e.g. a rhel 6 wouldn't have had that. With a quick test program on my Linux 5.10 kernel I see that an SA_RESTART signal handler definitely causes posix_fallocate() to return EINTR (can post trivial program). A drive-by look at the current/modern kernel source supports this: shmem_fallocate returns -EINTR directly (not -ERESTARTSYS, which seems to be the Linux-y way to say you want EINTR or restart as appropriate?), and it also undoes all partial progress too (not too surprising), which would explain why a perfectly timed machine gun stream of signals from our recovery conflict system can make an fallocate retry loop never terminate, for large enough sizes.